📊 Real-Time Product Clickstream Analytics with Kafka, Spark & Airflow

🔍 Overview

This project simulates a real-time e-commerce clickstream analytics pipeline. It captures, processes, and visualizes product click data using a modern big data stack, demonstrating how real-time streaming and interactive dashboards power decision-making in scalable systems.

🧭 Approach

We built an end-to-end architecture with Kafka producers injecting synthetic clickstream events, Spark Structured Streaming aggregating product views in 10-second windows, and Apache Airflow scheduling the workflow. Data is stored in both Parquet and CSV, which are then visualized in a Flask dashboard and Tableau.

⚙️ Methodologies

🧰 Technologies

💡 Key Learnings

📈 Results

The project efficiently processed and aggregated clickstream data in real-time. Visual dashboards displayed popular products and time-based trends with minimal latency. Tableau offered deep-dive insights, while Flask served as a real-time monitoring UI.